[00:00.000 --> 00:01.560]  Automation already has.
[00:03.680 --> 00:08.200]  So at this conference, we don't really need to explain what recon is or why it's good
[00:08.200 --> 00:14.180]  to do. So let's start with the question of why even automate? Why is there a talk about
[00:14.180 --> 00:20.160]  automation? Because if you're watching this, you obviously like to hack. So why would you
[00:20.160 --> 00:25.900]  get away from doing that hacking manually? I think it's a good question. And my best
[00:25.900 --> 00:30.280]  answer is that you don't actually have to choose between automation and manual hacking.
[00:30.520 --> 00:35.920]  You can have both. More specifically, what I'm saying is you can use automated recon
[00:35.920 --> 00:42.080]  to feed your manual hacking. So while you're asleep or gaming or relaxing, you can still
[00:42.080 --> 00:49.540]  be finding stuff to explore later. So that's the why. That's why I think automation is
[00:49.540 --> 00:55.880]  cool because it goes hand in hand with manual testing. But this talk is about the how, not
[00:55.880 --> 01:01.920]  really the why. So one of the things I hear about a lot is breaking down tools or methodologies
[01:01.920 --> 01:07.580]  or security in general into vulnerability assessment or penetration testing or bounty
[01:07.580 --> 01:13.680]  hunting and making strong lines between these. But I actually think those distinctions are
[01:13.800 --> 01:17.960]  a bit arbitrary when it comes to automation. And I actually prefer to abstract security
[01:17.960 --> 01:27.920]  into questions rather than categories. So I like to break down all my testing into individual,
[01:27.920 --> 01:33.800]  specific, distinct questions. And I get this idea from the Unix philosophy, which talks
[01:33.800 --> 01:39.520]  about making each program do one thing well and expecting the output of one thing to become
[01:39.520 --> 01:45.500]  the input of another. So I try to do the same thing with my automation. So on the right,
[01:45.500 --> 01:50.460]  you can see a number of questions I might want to know about a target. And I would say
[01:50.460 --> 01:57.560]  this applies almost every time. And I'm a very web-focused tester. So these are obviously
[01:57.560 --> 02:02.820]  very web-focused, but it's a giant list of questions. And it also includes network security
[02:02.820 --> 02:06.380]  as well, but it's a giant list of questions that I pretty much want to know about every
[02:06.380 --> 02:13.280]  target. So my automation is a way of asking and answering those questions for any arbitrary
[02:13.280 --> 02:21.870]  target. And that brings us here, where we have specific, discrete questions being answered
[02:21.870 --> 02:29.670]  by a specific piece of code. So in this case, we have two questions. What are their subdomains?
[02:29.670 --> 02:35.310]  Which is a question captured in checksubdomains.sh. And which of their subdomains are running
[02:35.310 --> 02:45.220]  web servers, which is captured in checkwebserver.sh. So that's what I'm going to talk about today,
[02:45.460 --> 02:51.040]  a way to ask lots of different security questions continuously, and then do fun things with
[02:51.040 --> 02:54.740]  those answers. And if you want to see what kinds of questions you might want to ask for
[02:54.840 --> 03:00.480]  a bug bounty, for example, I really recommend you start with a good methodology. And I think
[03:00.480 --> 03:08.440]  the best explanations of the power of methodologies comes from my best buddy, Jason Haddox, who's
[03:08.440 --> 03:13.140]  also speaking here, by the way, at the same conference. Jason not only talks about the
[03:13.140 --> 03:17.800]  steps in his bug hunters methodology series, but he talks about how to show them visually
[03:18.460 --> 03:24.380]  and maps them out into mind maps. It's really good stuff. And I really think he has the
[03:24.380 --> 03:33.050]  best content out there around web and recon methodologies. So let's look at a few examples
[03:33.050 --> 03:37.450]  of these. Let's start off with a simple case of where you know you have an external IP
[03:37.450 --> 03:43.430]  range. And you want to know what hosts are live in that range. The question there would
[03:43.430 --> 03:49.230]  be something like for a given IP range, which hosts are live? Or which are likely to be
[03:49.230 --> 03:56.250]  based on the fact that they're serving common services? That question produces a seed, live
[03:56.250 --> 04:04.690]  IPs.text, that becomes the input to countless other modules. So before I go more into modules,
[04:04.690 --> 04:09.230]  I want to actually say something real quick about the level of the code in your automation.
[04:09.230 --> 04:14.250]  I think this is likely to come up for anyone who thinks about automation or has actually
[04:14.250 --> 04:18.950]  tried to write their own. I like to think of it as two extremes, with frameworks on
[04:18.950 --> 04:25.690]  one side and completely custom code on the other. So completely custom code is like you're
[04:25.690 --> 04:31.030]  writing in C or Go or something, and you're writing your own port scanner or something.
[04:31.030 --> 04:37.250]  So you're interacting fairly directly with the kernel, low level programming, right?
[04:37.250 --> 04:43.030]  And with frameworks, you're like calling a masses enumerate function to get a list of
[04:43.030 --> 04:48.650]  IPs, for example. So I personally prefer to sit right in the middle of those two and write
[04:48.650 --> 04:56.010]  extremely small Unix-y modules that leverage a low level utility. So for example, I like to do
[04:56.010 --> 05:03.570]  my ASN and IP range lookups from ipinfo.io. So in my automation, I write small little wrappers
[05:03.570 --> 05:11.130]  for each of those discrete functions, rather than using a framework. I think that hybrid idea
[05:11.810 --> 05:17.370]  gives you a great combination of Unix-y like control, of very small things doing very small
[05:17.370 --> 05:24.930]  things, without needing to rewrite things like curl or nmap or mass scan on your own.
[05:26.070 --> 05:29.610]  So that's the level. I like to live right there in the middle.
[05:31.470 --> 05:37.350]  So here's another module that is fundamental to my workflow, which is just getting a full
[05:37.350 --> 05:44.830]  HTML output for a page. So here I'm using a headless Chromium browser via command line,
[05:44.830 --> 05:49.650]  so that it gets accepted when you make the query in as many web servers and
[05:49.650 --> 05:55.130]  defensive systems as possible, because curl is pretty much denied by a lot of things by now.
[05:55.650 --> 06:00.110]  And then once I have that raw HTML, I can do all sorts of stuff with it locally via
[06:00.110 --> 06:04.910]  different modules, like getting all the JavaScript files, parsing it to see if this
[06:04.910 --> 06:10.410]  page might be marked as sensitive, looking for artifacts that might indicate the application or
[06:10.410 --> 06:16.410]  text stack, looking for fields that might be known to be vulnerable to certain injections,
[06:16.830 --> 06:22.290]  et cetera. And I have at least a dozen of these just for parsing HTML,
[06:22.910 --> 06:27.410]  but you have to have the HTML to start with. So this is a good fundamental
[06:28.550 --> 06:31.890]  module or script or piece of code to start with.
[06:33.730 --> 06:40.830]  So one thing you always have to do for another module here is flesh out the scope of your target,
[06:40.830 --> 06:47.210]  which often involves pivoting from one TLD to another TLD that is related, but you don't
[06:47.210 --> 06:53.050]  necessarily even have the name of the domain, right? So you might be looking at Tesla and you're
[06:53.050 --> 06:57.350]  going to find other Tesla-related domains, but there are some Tesla-related domains that don't
[06:57.350 --> 07:03.070]  have Tesla in the domain itself. So one technique I like to use for that is following redirects
[07:03.070 --> 07:10.250]  to the target domain. And there can be flaws here. You can have other sites or territories
[07:10.250 --> 07:16.690]  or whatever that link to a different target, but aren't related. So you've got to do other
[07:16.690 --> 07:24.510]  checking to make sure that doesn't happen, but it is generally a fairly low noise, high signal
[07:25.170 --> 07:30.970]  method of finding new stuff. And to do this, one of the tools I like to use again is IP info
[07:30.970 --> 07:39.250]  and also host.io, which is related. And these are both a key part of my entire automation stack,
[07:39.250 --> 07:47.730]  because again, I'm writing small modules that call IP info and host.io explicitly to get one
[07:47.730 --> 07:52.890]  particular function and then produce an output from that. So they are how I do a lot of the lower
[07:52.890 --> 07:58.630]  level tasks, like getting ASNs for a company, getting IP ranges from ASNs, et cetera.
[08:00.670 --> 08:05.510]  Once again, there are a million tools that you can do this with, but in my opinion,
[08:05.510 --> 08:10.990]  the key to solid Unix-y automation is having something as low level as possible
[08:12.250 --> 08:18.730]  and non-abstracted as possible that you trust. And I just happen to use IP info for that.
[08:18.730 --> 08:23.970]  Some people use Hurricane for that. Other people use some of the DNS services where they have API
[08:23.970 --> 08:30.550]  access, or there's just a whole bunch of them out there. And this is just for my automation
[08:30.550 --> 08:35.050]  workflows, which is what we're talking about here. But if I want to go and do some manual exploring,
[08:35.510 --> 08:40.310]  and just check some stuff out, I'm extremely partial to Amass for that,
[08:40.310 --> 08:46.970]  especially since they keep adding functionality. It's now an OWASP project. And Jeff Foley,
[08:46.970 --> 08:52.170]  who runs that project, is just fantastic. So I really love Amass. It's my favorite
[08:52.850 --> 09:01.600]  all-arounder tool. I really enjoy it. All right. So now you've seen a number of modules,
[09:02.260 --> 09:06.580]  and you get the workflow for creating modules, right? And you understand that they have to be
[09:06.580 --> 09:14.140]  small. They have to have discrete output that isn't consumed by another discrete piece of
[09:14.140 --> 09:20.140]  functionality. So here's what it starts to look like when you want to answer a complex question.
[09:20.140 --> 09:24.840]  Again, think Unix. The output of one becomes the input of another.
[09:25.620 --> 09:32.160]  And this is a simplified view of a workflow. So we're going from git company to company.
[09:32.800 --> 09:36.960]  We're getting domains. We're taking domains, and we're getting ASNs.
[09:36.960 --> 09:42.500]  We're taking ASNs. We're getting ranges, and we're ending up with ranges.txt.
[09:42.860 --> 09:49.340]  So this is a simplified view of a workflow. In actuality, you might have multiple submodules
[09:49.340 --> 09:55.920]  that add sources or do cleanup or validation on another module. So domains.txt, for example,
[09:55.920 --> 10:01.780]  might have five different modules feeding into it and maybe one or two cleanup mechanisms
[10:01.780 --> 10:06.720]  that go and prune out noise from there. Make sure no junk is added.
[10:08.520 --> 10:14.380]  But now take that to the nth degree, right? You can have one piece of output that feeds
[10:14.380 --> 10:20.160]  10 different modules, and all of those modules on the right can then, in turn, feed each other
[10:21.320 --> 10:26.800]  or produce their own outputs. And over the last five years, I've created like 50 of these things
[10:26.800 --> 10:34.780]  for my own use. And the fact that you can just automate them is completely insane. I'm actually
[10:34.780 --> 10:39.440]  in the process of putting some of these on GitHub. I meant to do that before this talk, but
[10:40.080 --> 10:45.900]  I should have some up soon. And they'll be the ones I'm least embarrassed to put out there.
[10:45.900 --> 10:48.400]  Probably do some cleanup before I release them.
[10:50.400 --> 10:53.440]  And that brings us to the automation piece, right?
[10:53.800 --> 11:00.500]  So there's lots of fancy ways to do automation, but this is all about, you know, really cheap
[11:00.500 --> 11:06.340]  Linux box. What can you do with the tools available? And you can just use Cron for
[11:06.340 --> 11:12.080]  automation. Comes free with Linux. And you can use it not only to run your modules,
[11:12.080 --> 11:16.640]  but also to send your notification when things are found. You just need to figure
[11:16.640 --> 11:22.420]  out what needs to finish before other things start and wire that all up. And you could use
[11:22.420 --> 11:28.480]  code inside the module to make sure one thing is finished before another. And some interesting
[11:28.480 --> 11:36.480]  stuff that's fairly obvious once you start wiring stuff up. And finally, once all the modules are
[11:36.480 --> 11:42.580]  running continuously via Cron, so you've got these discrete pieces of code producing discrete
[11:42.580 --> 11:50.620]  outputs. They're all wired up using Cron. They're running continuously. You can then rig them up to
[11:50.620 --> 11:56.900]  notify you when they find something. And this is super easy to do via email, Slack, or really
[11:56.900 --> 12:04.160]  anything with an API. So I'm really partial to Amazon SES for email. You can send tons of emails
[12:04.160 --> 12:09.580]  with it for like pennies a month. You could do it all from the command line. And of course,
[12:09.580 --> 12:12.960]  you could even set up your own Slack channel where you monitor your favorite targets or
[12:12.960 --> 12:18.260]  bounties or whatever. And send yourself a Slack if your automation finds something new.
[12:21.690 --> 12:29.670]  So another thing to consider is how to collect and maintain and deploy all of this to the internet.
[12:29.670 --> 12:34.070]  Right? So we have the scripts themselves. You know, the code, the modules, whatever you want
[12:34.070 --> 12:42.350]  to call them. We have them automated via Cron. And we're now sending alerts with continuous
[12:42.350 --> 12:48.870]  monitoring that's going out via email or Slack or whatever system. But now the question is, okay,
[12:48.870 --> 12:54.330]  how do I build a box that does this? Like, you don't want to have it running on your home system,
[12:54.330 --> 13:00.350]  like shooting out of your home connection. It's a bad idea. So the natural way to do this is to
[13:00.350 --> 13:05.170]  build yourself a Linux box somewhere and just start hacking on it, right? And you start adding
[13:05.170 --> 13:10.630]  code and scripts or whatever. And you're pulling down some libraries and some modules and some
[13:10.630 --> 13:15.690]  third party tools or some open source tools, whatever. You're just, like, linking all this
[13:15.690 --> 13:21.250]  stuff up. And that works. But the problem is, once you want to replicate that somewhere else,
[13:21.250 --> 13:26.650]  you have to redo tons of work to make that box identical. So what I started doing a while back
[13:26.650 --> 13:33.810]  was using Terraform and Ansible combined with GitHub to manage all the code in the configs.
[13:34.010 --> 13:40.250]  So you have a self-contained directory for a new target that you want to monitor, like,
[13:40.770 --> 13:47.610]  you know, Verizon Media or Tesla or whatever program it is. So if I think up or hear about
[13:47.710 --> 13:54.970]  a new technique, I then make that change in the local copy and just redeploy the box using Terraform
[13:54.970 --> 14:01.490]  and Ansible. Or if I want to monitor a new target, I create a copy of that working box
[14:02.890 --> 14:08.930]  with the new seeds for the new target. And I just, you know, replicate that and then push it.
[14:08.930 --> 14:15.570]  And it goes out via Terraform and Ansible. And it builds itself inside of, you know,
[14:16.130 --> 14:23.250]  DigitalOcean or AWS, wherever you want to build it. And the crazy thing is, you can actually deploy
[14:23.810 --> 14:28.550]  and as soon as you press go, within a couple of minutes, the box comes up, it comes live.
[14:28.910 --> 14:36.810]  But because you have configured Cron already, it just starts monitoring, right? It just starts
[14:36.810 --> 14:41.330]  working automatically. And you immediately have your emails wired up. You immediately have your
[14:41.330 --> 14:47.490]  Slack wired up. And you can set up the variables such that it just starts working with the exact
[14:47.490 --> 14:53.370]  correct names of your new target. And it just starts finding all the domains, it starts finding
[14:53.370 --> 14:57.830]  all the subdomains, it starts pulling all the websites, it starts testing all the ports,
[14:57.830 --> 15:03.690]  it starts crawling, it starts, you know, doing vulnerability analysis on the sites themselves
[15:03.690 --> 15:08.270]  for cross-site scripting and RFIs and all the different techniques you want to use.
[15:08.270 --> 15:14.850]  All the automation just spins up and starts kicking off. And that is all connected to your
[15:14.850 --> 15:22.070]  alerting. So you can literally just go in, set up a new target locally in Terraform or Ansible,
[15:22.070 --> 15:29.230]  press go, and then within two or three minutes, you start getting the alerts. And that is like
[15:29.230 --> 15:38.360]  incredibly powerful. So what I love about doing automation in this way is that when I hear about
[15:38.480 --> 15:43.340]  a new technique, I don't just say, oh, that's cool. And maybe write it down or mostly forget
[15:43.340 --> 15:48.180]  about it. If it's really cool and I want to remember it, I make a note that I need to turn
[15:48.180 --> 15:55.580]  into a module. And then I go in and add that module to Ansible in the local config, right?
[15:55.580 --> 16:02.200]  So, for example, Jason just posted a thing on Twitter, I think like last week or maybe the week
[16:02.200 --> 16:09.300]  before, about crawling CVE details, the website, looking for URLs, because they're often talking
[16:09.300 --> 16:13.680]  about URLs that are sensitive or dangerous. Well, he was like, why don't I just crawl that and make
[16:13.780 --> 16:18.120]  a list of URLs? I was like, oh, that's super smart. So it's on my list of things to do to go
[16:18.120 --> 16:23.760]  create a module that does that. And I mean, I wouldn't have thought of that. I've thought of
[16:23.760 --> 16:28.740]  similar things, but I didn't think of that. And it's now on my list of things to do. And so you
[16:28.740 --> 16:34.120]  can go directly from a thing that you saw that you hadn't thought of, that someone else had a
[16:34.120 --> 16:39.620]  cool idea, and you can make a module out of it, which now incorporates that knowledge.
[16:39.740 --> 16:46.780]  It's just really frustrating to me to go to a talk or something, to be super excited about all
[16:46.780 --> 16:51.840]  the stuff that you hear. But then two weeks later, you're like, do I even remember any of that?
[16:52.260 --> 16:58.820]  Well, with an automation stack, you can convert your knowledge and your learnings
[16:58.820 --> 17:06.680]  into something tangible and repeatable. So everyone you're seeing on this slide here,
[17:06.680 --> 17:12.380]  everyone you see here is some combination of a hunter, a tester, and a content creator.
[17:12.380 --> 17:17.760]  And you should absolutely be following their work. So I mean, these people are putting out
[17:18.660 --> 17:24.420]  really cool tools. They're putting out YouTube videos. They're helping the community. They're
[17:24.420 --> 17:27.980]  super accessible. You can just ping them on Twitter and be like, hey, what do you think of
[17:27.980 --> 17:34.600]  this technique? Just very accessible, very knowledgeable, and producing great stuff for
[17:34.600 --> 17:40.040]  the community. So you should absolutely be following their work. Constantly learning
[17:40.040 --> 17:44.580]  stuff from them and many others like them in the community. And it's just really good to be
[17:44.580 --> 17:52.870]  tapped into people like this. All right. So that's what I wanted to talk about today.
[17:53.210 --> 17:58.490]  And as a quick summary, so the biggest takeaway from my approach to this is breaking up your
[17:58.490 --> 18:05.410]  testing into discrete commands. And you want to stay as close to a trusted source as possible
[18:05.410 --> 18:11.230]  when you're writing those commands. So avoiding abstraction. You want to make sure your output
[18:11.230 --> 18:17.940]  is extremely well named and clean. And goes into artifacts that can be used by other modules.
[18:18.450 --> 18:23.770]  You want to chain your commands together into a methodology and schedule them running
[18:23.770 --> 18:28.290]  Cron in this case. You can use whatever you want. But Cron's there and it's free.
[18:29.570 --> 18:34.970]  And you want to lock in your configs into repeatable deployments using Axiom or Terraform
[18:34.970 --> 18:42.910]  or Ansible. Axiom is another really cool tool for this. Yeah. Really, really cool stuff. A guy
[18:42.910 --> 18:49.810]  named Ben made this. And it's a deployment automatically into DigitalOcean. And it's
[18:49.810 --> 18:55.710]  similar to what I'm using with Terraform and Ansible. You should check out that project.
[18:56.170 --> 19:00.170]  You want to follow the people I mentioned and stay aware of the newest techniques.
[19:00.530 --> 19:05.690]  They're putting out great stuff. You just need to follow them. Trust me on this. And you'll find
[19:05.690 --> 19:10.450]  other people to follow by following them. And finally, when your automation workflow
[19:10.450 --> 19:17.190]  brings you fruit, go and hack on it manually for fun or profit or whatever.
[19:19.490 --> 19:22.770]  This is how you can get a hold of me. If you want to chat more about this,
[19:22.770 --> 19:26.890]  there's a number of people thinking about this problem right now and some really cool
[19:26.890 --> 19:33.130]  frameworks coming out around it as well. And yeah, if you want to chat about it, just hit me up.
[19:33.470 --> 19:38.230]  And thanks again to the Red Team Village for having me. We'll see you next time.
